This paper presents a technique to train a robot to perform kick-motion in AI soccer by using reinforcement learning (RL). In RL, an agent interacts with an environment and learns to choose an action in a state at each step. When training RL algorithms, a problem called the curse of dimensionality (COD) can occur if the dimension of the state is high and the number of training data is low. The COD often causes degraded performance of RL models. In the situation of the robot kicking the ball, as the ball approaches the robot, the robot chooses the action based on the information obtained from the soccer field. In order not to suffer COD, the training data, which are experiences in the case of RL, should be collected evenly from all areas of the soccer field over (theoretically infinite) time. In this paper, we attempt to use the relative coordinate system (RCS) as the state for training kick-motion of robot agent, instead of using the absolute coordinate system (ACS). Using the RCS eliminates the necessity for the agent to know all the (state) information of entire soccer field and reduces the dimension of the state that the agent needs to know to perform kick-motion, and consequently alleviates COD. The training based on the RCS is performed with the widely used Deep Q-network (DQN) and tested in the AI Soccer environment implemented with Webots simulation software.
translated by 谷歌翻译
本文介绍了微型拍打翼无人机的数据驱动的最佳控制政策。首先,根据动力学的几何公式​​计算一组最佳轨迹,该动力学的几何公式​​捕获了大角度拍打运动与准稳态空气动力学之间的非线性耦合。然后,根据模仿学习的框架,它被转换为反馈控制系统。特别是,通过学习过程加入了附加的约束,以增强所得控制动力学的稳定性。与常规方法相比,所提出的约束模仿学习消除了在线生成其他最佳轨迹的需求,而无需牺牲稳定性。因此,计算效率大大提高。此外,这建立了第一个非线性控制系统,该系统稳定了旋转翼航空车辆的耦合纵向和横向动力学,而无需依赖平均或线性化。这些由数值示例说明,该示例的模拟模型受君主蝴蝶的启发。
translated by 谷歌翻译
Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches used for training. When calculating the loss function, off-policy algorithms assume that all samples are of the same importance. In this paper, we hypothesize that training can be enhanced by assigning different importance for each experience based on their temporal-difference (TD) error directly in the training objective. We propose a novel method that introduces a weighting factor for each experience when calculating the loss function at the learning stage. In addition to improving convergence speed when used with uniform sampling, the method can be combined with prioritization methods for non-uniform sampling. Combining the proposed method with prioritization methods improves sampling efficiency while increasing the performance of TD-based off-policy RL algorithms. The effectiveness of the proposed method is demonstrated by experiments in six environments of the OpenAI Gym suite. The experimental results demonstrate that the proposed method achieves a 33%~76% reduction of convergence speed in three environments and an 11% increase in returns and a 3%~10% increase in success rate for other three environments.
translated by 谷歌翻译
As the demand for autonomous driving increases, it is paramount to ensure safety. Early accident prediction using deep learning methods for driving safety has recently gained much attention. In this task, early accident prediction and a point prediction of where the drivers should look are determined, with the dashcam video as input. We propose to exploit the double actors and regularized critics (DARC) method, for the first time, on this accident forecasting platform. We derive inspiration from DARC since it is currently a state-of-the-art reinforcement learning (RL) model on continuous action space suitable for accident anticipation. Results show that by utilizing DARC, we can make predictions 5\% earlier on average while improving in multiple metrics of precision compared to existing methods. The results imply that using our RL-based problem formulation could significantly increase the safety of autonomous driving.
translated by 谷歌翻译
傅立叶神经操作员(FNO)是物理启发的机器学习方法之一。特别是它是神经操作员。近来,已经开发了几种类型的神经操作员,例如深层操作员网络,GNO和MWTO。与其他模型相比,FNO在计算上是有效的,并且可以在功能空间之间学习非线性操作员,而与某个有限的基础无关。在这项研究中,我们根据特定组规范研究了FNO的Rademacher复杂性的边界。使用基于这些规范的容量,我们绑定了FNO模型的概括误差。此外,我们研究了经验概括误差与提议的FNO能力之间的相关性。基于这项调查,我们深入了解了模型体系结构对概括误差的影响,并估计了存储在各种能力中的FNO模型的信息量。
translated by 谷歌翻译
在环境中的多进球强化学习中,代理商通过利用从与环境的互动中获得的经验来学习实现多个目标的政策。由于缺乏成功的经验,培训代理人凭借稀疏的二元奖励特别具有挑战性。为了解决这个问题,事后观察体验重播(她)从失败的经历中获得了成功的经验。但是,在不考虑实现目标财产的情况下产生成功的经验效率较低。在本文中,提出了一种基于集群的采样策略,利用实现目标的财产。提出的采样策略小组以不同的方式实现了目标和样本经历。对于分组,使用K-均值聚类算法。集群的质心是从定义为未实现的原始目标的失败目标的分布中获得的。该方法通过使用OpenAI健身房的三个机器人控制任务进行实验来验证。实验的结果表明,所提出的方法显着减少了在这三个任务中的两个中收敛所需的时期数量,并略微增加了其余一个任务的成功率。还表明,提出的方法可以与她的其他抽样策略结合使用。
translated by 谷歌翻译
最近,随着对清洁机器人的需求稳步增加,因此家庭用电也在增加。为了解决这一电力消耗问题,有效的清洁机器人路径计划的问题变得很重要,并且已经进行了许多研究。但是,他们中的大多数是沿着简单的路径段移动,而不是清洁所有地方的整个路径。随着新兴的深度学习技术,已采用了加强学习(RL)来清洁机器人。但是,RL的模型仅在特定的清洁环境中运行,而不是各种清洁环境。问题在于,每当清洁环境变化时,模型都必须进行重新培训。为了解决此问题,近端策略优化(PPO)算法与有效的路径计划结合使用,该计划在各种清洁环境中运行,使用转移学习(TL),检测最接近的清洁瓷砖,奖励成型,并制作精英设置方法。通过消融研究对所提出的方法进行验证,并与常规方法(例如随机和曲折)进行比较。实验结果表明,所提出的方法可以提高训练性能,并提高原始PPO的收敛速度。它还表明,这种提出的方​​法比常规方法(随机,曲折)更好。
translated by 谷歌翻译
在实际应用中,高度要求进行语义细分的域概括,在这种应用中,训练有素的模型预计在以前看不见的域中可以很好地工作。一个挑战在于缺乏数据可能涵盖可能看不见的培训领域的各种分布的数据。在本文中,我们提出了一个Web图像辅助域的概括(Wedge)方案,该方案是第一个利用Web爬行图像多样性进行概括的语义细分。为了探索和利用现实世界的数据分布,我们收集了一个网络爬行的数据集,该数据集在天气条件,站点,照明,相机样式等方面呈现出较大的多样性。我们还提出了一种注入Web样式表示的方法 - 将数据编进培训期间的源域中,这使网络能够以可靠的标签体验各种样式的图像,以进行有效的培训。此外,我们使用带有预测的伪标签的Web爬行数据集进行培训,以进一步增强网络的功能。广泛的实验表明,我们的方法显然优于现有的域泛化技术。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
Cellular automata (CA) captivate researchers due to teh emergent, complex individualized behavior that simple global rules of interaction enact. Recent advances in the field have combined CA with convolutional neural networks to achieve self-regenerating images. This new branch of CA is called neural cellular automata [1]. The goal of this project is to use the idea of idea of neural cellular automata to grow prediction machines. We place many different convolutional neural networks in a grid. Each conv net cell outputs a prediction of what the next state will be, and minimizes predictive error. Cells received their neighbors' colors and fitnesses as input. Each cell's fitness score described how accurate its predictions were. Cells could also move to explore their environment and some stochasticity was applied to movement.
translated by 谷歌翻译